議題描述&資料集介紹

動機:
考試成績一直都是衡量學習績效的指標之一,甚至是升學主要管道。
「教育平等」一直是社會發展目標,然而除了學生本身的努力與天份外,許多外在因素也會影響學生成績表現。
究竟學生的成績還會受到哪些外在因素影響?

資料表介紹

使用資料集:Student Alcohol Consumption (2008)
來源:Kaggle
作者:UCI MACHINE LEARNING
介紹:對葡萄牙的兩所中學參與數學和葡萄牙語課程的學生進行調查,
此次分析取用數學成績資料集,
內容包含性別、年齡、家庭背景、居住環境、生活習慣、飲酒頻率和健康狀況等。
這個資料集有33個欄位、395位學生的樣本量。

資料集來源網址:https://www.kaggle.com/datasets/uciml/student-alcohol-consumption

欄位介紹

欄位名稱 型態 簡介
school binary 學校(G/M)
sex binary 學生的性別(F - 女性或M - 男性)
age numeric 學生年齡
address binary 學生的家庭住址類型(U - 城市或R - 農村)
famsize binary 家庭規模(LE3 - 小於或等於3 或 GT3 - 大於3)
Pstatus binary 父母的同居狀態(T - 共同生活或A - 分開)
Medu numeric 母親的教育程度(0. 無、1. 小學教育(4年級)、2. 5~9年級、3. 中等教育、4. 高等教育)
Fedu numeric 父親的教育程度(0. 無、1. 小學教育(4年級)、2. 5~9年級、3. 中等教育、4. 高等教育)
Mjob character 母親的工作(老師、 健康護理相關、民事服務(例如行政或警察)、 at_home或其他)
Fjob character 父親的工作(老師、 健康護理相關、民事服務(例如行政或警察)、 at_home或其他)
reason character 選擇這所學校的理由(家、學校聲譽、課程偏好或其他)
guardian character 學生的監護人(母親、父親或其他)
traveltime numeric 學校&家裡通勤時間(1. 15分鐘、2. 15~30分鐘、3. 30分鐘~1小時、4. 1小時)
studytime numeric 每週自主學習時間(1. 2小時、2. 2~5小時、3. 5~10小時,4. 10小時)
failures numeric 過去課程失敗的數量(沒通過該課程)
schoolsup binary 額外的教育支持(是或否)
famsup binary 家庭教育的支持(是或否)
paid binary 課程科目中的額外付費課程(是或否)
activities binary 課外活動(是或否)
nursery binary 上幼兒園(是或否)
higher binary 想接受高等教育(是或否)
internet binary 家庭上網(是或否)
romantic binary 有沒有戀愛的關係(是或否)
famrel numeric 家庭關係品質(從1.非常差~5.優秀)
freetime numeric 放學後的空閒時間(從1.非常低~5.非常高)
goout numeric 和朋友一起出去玩的頻率(從1.非常低~5.非常高)
Dalc numeric 工作日飲酒量(從1.非常低~5.非常高)
Walc numeric 周末飲酒量(從1.非常低~5.非常高)
health numeric 健康程度(從1.非常差~5.非常好)
absences numeric 缺課次數(0~93)
G1 numeric 第一階段成績(0~20)
G2 numeric 第二階段成績(0~20)
G3 numeric 最終成績(0~20)

檢查資料&整理

查看資料內容
    school              sex                 age         address         
 Length:395         Length:395         Min.   :15.0   Length:395        
 Class :character   Class :character   1st Qu.:16.0   Class :character  
 Mode  :character   Mode  :character   Median :17.0   Mode  :character  
                                       Mean   :16.7                     
                                       3rd Qu.:18.0                     
                                       Max.   :22.0                     
   famsize            Pstatus               Medu            Fedu      
 Length:395         Length:395         Min.   :0.000   Min.   :0.000  
 Class :character   Class :character   1st Qu.:2.000   1st Qu.:2.000  
 Mode  :character   Mode  :character   Median :3.000   Median :2.000  
                                       Mean   :2.749   Mean   :2.522  
                                       3rd Qu.:4.000   3rd Qu.:3.000  
                                       Max.   :4.000   Max.   :4.000  
     Mjob               Fjob              reason            guardian        
 Length:395         Length:395         Length:395         Length:395        
 Class :character   Class :character   Class :character   Class :character  
 Mode  :character   Mode  :character   Mode  :character   Mode  :character  
                                                                            
                                                                            
                                                                            
   traveltime      studytime        failures       schoolsup        
 Min.   :1.000   Min.   :1.000   Min.   :0.0000   Length:395        
 1st Qu.:1.000   1st Qu.:1.000   1st Qu.:0.0000   Class :character  
 Median :1.000   Median :2.000   Median :0.0000   Mode  :character  
 Mean   :1.448   Mean   :2.035   Mean   :0.3342                     
 3rd Qu.:2.000   3rd Qu.:2.000   3rd Qu.:0.0000                     
 Max.   :4.000   Max.   :4.000   Max.   :3.0000                     
    famsup              paid            activities          nursery         
 Length:395         Length:395         Length:395         Length:395        
 Class :character   Class :character   Class :character   Class :character  
 Mode  :character   Mode  :character   Mode  :character   Mode  :character  
                                                                            
                                                                            
                                                                            
    higher            internet           romantic             famrel     
 Length:395         Length:395         Length:395         Min.   :1.000  
 Class :character   Class :character   Class :character   1st Qu.:4.000  
 Mode  :character   Mode  :character   Mode  :character   Median :4.000  
                                                          Mean   :3.944  
                                                          3rd Qu.:5.000  
                                                          Max.   :5.000  
    freetime         goout            Dalc            Walc      
 Min.   :1.000   Min.   :1.000   Min.   :1.000   Min.   :1.000  
 1st Qu.:3.000   1st Qu.:2.000   1st Qu.:1.000   1st Qu.:1.000  
 Median :3.000   Median :3.000   Median :1.000   Median :2.000  
 Mean   :3.235   Mean   :3.109   Mean   :1.481   Mean   :2.291  
 3rd Qu.:4.000   3rd Qu.:4.000   3rd Qu.:2.000   3rd Qu.:3.000  
 Max.   :5.000   Max.   :5.000   Max.   :5.000   Max.   :5.000  
     health         absences            G1              G2       
 Min.   :1.000   Min.   : 0.000   Min.   : 3.00   Min.   : 0.00  
 1st Qu.:3.000   1st Qu.: 0.000   1st Qu.: 8.00   1st Qu.: 9.00  
 Median :4.000   Median : 4.000   Median :11.00   Median :11.00  
 Mean   :3.554   Mean   : 5.709   Mean   :10.91   Mean   :10.71  
 3rd Qu.:5.000   3rd Qu.: 8.000   3rd Qu.:13.00   3rd Qu.:13.00  
 Max.   :5.000   Max.   :75.000   Max.   :19.00   Max.   :19.00  
       G3            gmeans      
 Min.   : 0.00   Min.   : 1.333  
 1st Qu.: 8.00   1st Qu.: 8.333  
 Median :11.00   Median :10.667  
 Mean   :10.42   Mean   :10.679  
 3rd Qu.:14.00   3rd Qu.:13.333  
 Max.   :20.00   Max.   :19.333  
->資料大多為 類別型 & 1~5 level的數值型變數
查看資料型態&檢查資料
'data.frame':   395 obs. of  34 variables:
 $ school    : chr  "GP" "GP" "GP" "GP" ...
 $ sex       : chr  "F" "F" "F" "F" ...
 $ age       : int  18 17 15 15 16 16 16 17 15 15 ...
 $ address   : chr  "U" "U" "U" "U" ...
 $ famsize   : chr  "GT3" "GT3" "LE3" "GT3" ...
 $ Pstatus   : chr  "A" "T" "T" "T" ...
 $ Medu      : int  4 1 1 4 3 4 2 4 3 3 ...
 $ Fedu      : int  4 1 1 2 3 3 2 4 2 4 ...
 $ Mjob      : chr  "at_home" "at_home" "at_home" "health" ...
 $ Fjob      : chr  "teacher" "other" "other" "services" ...
 $ reason    : chr  "course" "course" "other" "home" ...
 $ guardian  : chr  "mother" "father" "mother" "mother" ...
 $ traveltime: int  2 1 1 1 1 1 1 2 1 1 ...
 $ studytime : int  2 2 2 3 2 2 2 2 2 2 ...
 $ failures  : int  0 0 3 0 0 0 0 0 0 0 ...
 $ schoolsup : chr  "yes" "no" "yes" "no" ...
 $ famsup    : chr  "no" "yes" "no" "yes" ...
 $ paid      : chr  "no" "no" "yes" "yes" ...
 $ activities: chr  "no" "no" "no" "yes" ...
 $ nursery   : chr  "yes" "no" "yes" "yes" ...
 $ higher    : chr  "yes" "yes" "yes" "yes" ...
 $ internet  : chr  "no" "yes" "yes" "yes" ...
 $ romantic  : chr  "no" "no" "no" "yes" ...
 $ famrel    : int  4 5 4 3 4 5 4 4 4 5 ...
 $ freetime  : int  3 3 3 2 3 4 4 1 2 5 ...
 $ goout     : int  4 3 2 2 2 2 4 4 2 1 ...
 $ Dalc      : int  1 1 2 1 1 1 1 1 1 1 ...
 $ Walc      : int  1 1 3 1 2 2 1 1 1 1 ...
 $ health    : int  3 3 3 5 5 5 3 1 1 5 ...
 $ absences  : int  6 4 10 2 4 10 0 6 0 0 ...
 $ G1        : int  5 5 7 15 6 15 12 6 16 14 ...
 $ G2        : int  6 5 8 14 10 15 12 5 18 15 ...
 $ G3        : int  6 6 10 15 10 15 11 6 19 15 ...
 $ gmeans    : num  5.67 5.33 8.33 14.67 8.67 ...
[1] "空值數量: 0"
[1] 0

->檢查資料是否有空值和缺失值並印出,此資料集無空值&缺失值

檢定&迴歸模型-數值型欄位

相關係數矩陣
與成績表現之相關性:
                   [,1]
age        -0.134589374
Medu        0.224259868
Fedu        0.175852135
traveltime -0.128197163
studytime   0.134564719
failures   -0.375758896
famrel      0.021652521
freetime    0.003773140
goout      -0.154511336
Dalc       -0.072508178
Walc       -0.088024671
health     -0.080380376
absences   -0.005908806
->可以看到與成績表現之相關性如上
負相關 : 過去課程失敗數量>與朋友出遊頻率>年齡>到學校通勤時間
正相關 : 母親教育程度>父親教育程度>自主讀書時間

兩欄位之相關性:
                    age         Medu         Fedu   traveltime    studytime
age         1.000000000 -0.163658419 -0.163438069  0.070640721 -0.004140037
Medu       -0.163658419  1.000000000  0.623455112 -0.171639305  0.064944137
Fedu       -0.163438069  0.623455112  1.000000000 -0.158194054 -0.009174639
traveltime  0.070640721 -0.171639305 -0.158194054  1.000000000 -0.100909119
studytime  -0.004140037  0.064944137 -0.009174639 -0.100909119  1.000000000
failures    0.243665377 -0.236679963 -0.250408444  0.092238746 -0.173563031
famrel      0.053940096 -0.003914458 -0.001369727 -0.016807986  0.039730704
freetime    0.016434389  0.030890867 -0.012845528 -0.017024944 -0.143198407
goout       0.126963880  0.064094438  0.043104668  0.028539674 -0.063903675
Dalc        0.131124605  0.019834099  0.002386429  0.138325309 -0.196019263
Walc        0.117276052 -0.047123460 -0.012631018  0.134115752 -0.253784731
health     -0.062187369 -0.046877829  0.014741537  0.007500606 -0.075615863
absences    0.175230079  0.100284818  0.024472887 -0.012943775 -0.062700175
              failures       famrel    freetime        goout         Dalc
age         0.24366538  0.053940096  0.01643439  0.126963880  0.131124605
Medu       -0.23667996 -0.003914458  0.03089087  0.064094438  0.019834099
Fedu       -0.25040844 -0.001369727 -0.01284553  0.043104668  0.002386429
traveltime  0.09223875 -0.016807986 -0.01702494  0.028539674  0.138325309
studytime  -0.17356303  0.039730704 -0.14319841 -0.063903675 -0.196019263
failures    1.00000000 -0.044336626  0.09198747  0.124560922  0.136046931
famrel     -0.04433663  1.000000000  0.15070144  0.064568411 -0.077594357
freetime    0.09198747  0.150701444  1.00000000  0.285018715  0.209000848
goout       0.12456092  0.064568411  0.28501871  1.000000000  0.266993848
Dalc        0.13604693 -0.077594357  0.20900085  0.266993848  1.000000000
Walc        0.14196203 -0.113397308  0.14782181  0.420385745  0.647544230
health      0.06582728  0.094055728  0.07573336 -0.009577254  0.077179582
absences    0.06372583 -0.044354095 -0.05807792  0.044302220  0.111908026
                  Walc       health    absences
age         0.11727605 -0.062187369  0.17523008
Medu       -0.04712346 -0.046877829  0.10028482
Fedu       -0.01263102  0.014741537  0.02447289
traveltime  0.13411575  0.007500606 -0.01294378
studytime  -0.25378473 -0.075615863 -0.06270018
failures    0.14196203  0.065827282  0.06372583
famrel     -0.11339731  0.094055728 -0.04435409
freetime    0.14782181  0.075733357 -0.05807792
goout       0.42038575 -0.009577254  0.04430222
Dalc        0.64754423  0.077179582  0.11190803
Walc        1.00000000  0.092476317  0.13629110
health      0.09247632  1.000000000 -0.02993671
absences    0.13629110 -0.029936711  1.00000000

1.工作日飲酒量&周末飲酒量相關性較高
2.母親教育程度&父親教育程度相關性也偏高

anova-檢查高相關性欄位是否對成績有影響
飲酒量&數學成績表現anova檢定:
             Df Sum Sq Mean Sq F value Pr(>F)  
Walc          1     42   41.72   3.069 0.0806 .
Residuals   393   5343   13.59                 
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
             Df Sum Sq Mean Sq F value Pr(>F)
Dalc          1     28   28.31   2.077   0.15
Residuals   393   5356   13.63               
-工作日飲酒量&周末飲酒量對於學生成績表現較無差異
->怕影響模型準確度,可不納入迴歸模型預測

父、母親教育程度&數學成績表現anova檢定:
             Df Sum Sq Mean Sq F value   Pr(>F)    
Fedu          1    167  166.51   12.54 0.000446 ***
Residuals   393   5218   13.28                     
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
             Df Sum Sq Mean Sq F value   Pr(>F)    
Medu          1    271  270.80   20.81 6.78e-06 ***
Residuals   393   5114   13.01                     
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
-母親教育程度&父親教育程度對於學生成績表現有顯著差異
->須納入迴歸模型預測

迴歸模型
所有數值型變數預測(僅移除Dalc、Walc):

Call:
lm(formula = gmeans ~ age + Medu + Fedu + traveltime + studytime + 
    failures + famrel + freetime + goout + health + absences, 
    data = student_data)

Residuals:
    Min      1Q  Median      3Q     Max 
-9.0569 -2.1228  0.1757  2.2471  8.6242 

Coefficients:
             Estimate Std. Error t value Pr(>|t|)    
(Intercept) 11.713276   2.661103   4.402 1.40e-05 ***
age         -0.066310   0.142430  -0.466  0.64179    
Medu         0.423930   0.204181   2.076  0.03854 *  
Fedu         0.053162   0.203646   0.261  0.79419    
traveltime  -0.325462   0.249287  -1.306  0.19248    
studytime    0.291015   0.210385   1.383  0.16739    
failures    -1.521991   0.249374  -6.103 2.55e-09 ***
famrel       0.038369   0.193661   0.198  0.84305    
freetime     0.301203   0.182396   1.651  0.09948 .  
goout       -0.469382   0.162040  -2.897  0.00399 ** 
health      -0.155294   0.124187  -1.250  0.21189    
absences     0.008149   0.021924   0.372  0.71034    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 3.369 on 383 degrees of freedom
Multiple R-squared:  0.1927,    Adjusted R-squared:  0.1695 
F-statistic: 8.313 on 11 and 383 DF,  p-value: 3.968e-13
因Dalc&Walc具高相關性但不影響成績,故可排除。
將所有數值型欄位考慮進來,pvalue非常小,模型解釋力偏低。

篩選出相關性高的變數預測:

Call:
lm(formula = gmeans ~ age + Medu + Fedu + failures + traveltime + 
    goout, data = student_data)

Residuals:
    Min      1Q  Median      3Q     Max 
-9.2480 -2.1065 -0.0019  2.2821  8.3447 

Coefficients:
             Estimate Std. Error t value Pr(>|t|)    
(Intercept) 12.332195   2.442666   5.049 6.86e-07 ***
age         -0.039885   0.139294  -0.286   0.7748    
Medu         0.483759   0.201271   2.404   0.0167 *  
Fedu        -0.003631   0.202249  -0.018   0.9857    
failures    -1.577369   0.244427  -6.453 3.27e-10 ***
traveltime  -0.371541   0.248266  -1.497   0.1353    
goout       -0.399720   0.155660  -2.568   0.0106 *  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 3.372 on 388 degrees of freedom
Multiple R-squared:  0.1806,    Adjusted R-squared:  0.1679 
F-statistic: 14.25 on 6 and 388 DF,  p-value: 1.077e-14

pvalue更小,模型解釋力仍低。

小結
1. 迴歸模型預測解釋力低,推測較不適用於此資料集,因數據多為分等級(Level)的資料,且類別型欄位多,無法由數值型欄位預測成績表現。
2. 父&母親教育程度、與朋友出遊頻率對學生成績最有影響。
3. 接著對類別型欄位作檢定分析,查看是否有影響成績表現較為突出的欄位。

檢定-類別型欄位

性別
看性別會不會影響成績:
F檢定 :

    F test to compare two variances

data:  stu_female$gmeans and stu_male$gmeans
F = 0.9252, num df = 207, denom df = 186, p-value = 0.5849
alternative hypothesis: true ratio of variances is not equal to 1
95 percent confidence interval:
 0.6979068 1.2239012
sample estimates:
ratio of variances 
         0.9251996 
->var1 = var2 落在接受域

T檢定 :

    Two Sample t-test

data:  student_data[, "gmeans"] by student_data[, "sex"]
t = -2.015, df = 393, p-value = 0.04459
alternative hypothesis: true difference in means between group F and group M is not equal to 0
95 percent confidence interval:
 -1.4773516 -0.0181749
sample estimates:
mean in group F mean in group M 
       10.32532        11.07308 
->P value<0.05,故性別在數學成績表現上有差異。

->從盒型圖和點狀圖可看出男生略高於女生成績表現。

居住地
看居住地(都市和鄉村)會不會影響成績
F檢定:

    F test to compare two variances

data:  stu_radd$gmeans and stu_uadd$gmeans
F = 1.0232, num df = 87, denom df = 306, p-value = 0.8685
alternative hypothesis: true ratio of variances is not equal to 1
95 percent confidence interval:
 0.7412387 1.4587802
sample estimates:
ratio of variances 
          1.023217 
->var1 = var2 落在接受域

T檢定 :

    Two Sample t-test

data:  student_data[, "gmeans"] by student_data[, "address"]
t = -2.1394, df = 393, p-value = 0.03302
alternative hypothesis: true difference in means between group R and group U is not equal to 0
95 percent confidence interval:
 -1.82688587 -0.07717099
sample estimates:
mean in group R mean in group U 
       9.939394       10.891422 
->P value<0.05,故居住地在數學成績表現上有差異。

->從盒型圖和點狀圖可看出居住地在都市略高於鄉村成績表現。

anova-父、母親工作類別對成績之影響
母工作類別:
             Df Sum Sq Mean Sq F value  Pr(>F)   
Mjob          4    235   58.77   4.451 0.00158 **
Residuals   390   5149   13.20                   
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
->母親工作類別對於學生成績表現有差異。
事後檢定:
  Tukey multiple comparisons of means
    99% family-wise confidence level

Fit: aov(formula = gmeans ~ Mjob, data = student_data)

$Mjob
                       diff         lwr        upr     p adj
health-at_home    2.4725823 -0.09163225 5.03679675 0.0145850
other-at_home     0.2963898 -1.55014620 2.14292578 0.9846718
services-at_home  1.4444079 -0.50001728 3.38883303 0.1083584
teacher-at_home   1.5074031 -0.69466768 3.70947384 0.1660319
other-health     -2.1761925 -4.45154320 0.09915828 0.0158144
services-health  -1.0281744 -3.38366053 1.32731177 0.6082512
teacher-health   -0.9651792 -3.53746249 1.60710414 0.7339905
services-other    1.1480181 -0.39561858 2.69165475 0.1076394
teacher-other     1.2110133 -0.64671129 3.06873787 0.2067748
teacher-services  0.0629952 -1.89205841 2.01804881 0.9999718

->從事健康護理相關和從事居家工作差異最大。

畫圖:

[[1]]
[[1]]$stats
     [,1] [,2] [,3] [,4] [,5]
[1,]  5.0    6    3    5    6
[2,]  8.0   10    8    8    9
[3,] 10.0   13   10   11   11
[4,] 12.5   14   13   14   14
[5,] 18.0   19   19   19   18

[[1]]$n
[1]  59  34 141 103  58

[[1]]$conf
          [,1]     [,2]    [,3]     [,4]      [,5]
[1,]  9.074357 11.91613  9.3347 10.06591  9.962679
[2,] 10.925643 14.08387 10.6653 11.93409 12.037321

[[1]]$out
numeric(0)

[[1]]$group
numeric(0)

[[1]]$names
[1] "at_home"  "health"   "other"    "services" "teacher" 


[[2]]
[[2]]$stats
     [,1] [,2] [,3] [,4] [,5]
[1,]    5    7    5    4    5
[2,]    8   10    8    9    9
[3,]   10   13   10   12   11
[4,]   12   15   13   14   14
[5,]   18   19   18   18   19

[[2]]$n
[1]  59  34 141 103  58

[[2]]$conf
          [,1]     [,2]    [,3]     [,4]      [,5]
[1,]  9.177206 11.64516  9.3347 11.22159  9.962679
[2,] 10.822794 14.35484 10.6653 12.77841 12.037321

[[2]]$out
 [1] 0 0 0 0 0 0 0 0 0 0 0 0 0

[[2]]$group
 [1] 1 1 1 3 3 3 3 3 4 4 4 4 5

[[2]]$names
[1] "at_home"  "health"   "other"    "services" "teacher" 


[[3]]
[[3]]$stats
     [,1] [,2] [,3] [,4] [,5]
[1,]  5.0    8    4  5.0    6
[2,]  8.0   10    8  9.0    9
[3,] 10.0   13   11 11.0   11
[4,] 12.5   15   13 14.5   14
[5,] 19.0   20   19 19.0   19

[[3]]$n
[1]  59  34 141 103  58

[[3]]$conf
          [,1]     [,2]    [,3]     [,4]      [,5]
[1,]  9.074357 11.64516 10.3347 10.14375  9.962679
[2,] 10.925643 14.35484 11.6653 11.85625 12.037321

[[3]]$out
 [1] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

[[3]]$group
 [1] 1 1 1 1 1 1 1 1 1 2 2 3 3 3 3 3 3 3 3 3 3 3 3 3 3 4 4 4 4 4 4 4 4 4 5 5 5 5

[[3]]$names
[1] "at_home"  "health"   "other"    "services" "teacher" 


[[4]]
[[4]]$stats
          [,1]      [,2]      [,3]      [,4]     [,5]
[1,]  2.666667  4.333333  1.333333  2.333333  2.00000
[2,]  7.500000  9.666667  8.000000  9.000000  9.00000
[3,]  9.666667 12.833333 10.000000 11.333333 10.83333
[4,] 12.500000 14.666667 12.666667 13.666667 14.33333
[5,] 18.333333 19.333333 18.666667 18.333333 18.66667

[[4]]$n
[1]  59  34 141 103  58

[[4]]$conf
          [,1]     [,2]      [,3]     [,4]      [,5]
[1,]  8.638174 11.47849  9.379053 10.60682  9.726858
[2,] 10.695159 14.18817 10.620947 12.05985 11.939809

[[4]]$out
[1] 1.666667

[[4]]$group
[1] 4

[[4]]$names
[1] "at_home"  "health"   "other"    "services" "teacher" 


->母親職業為健康護理相關對於學生成績表現最好。
->母親從事居家工作對於學生成績表現最差。

knn分類預測:
          TrainFlag
           FALSE TRUE
  at_home     15   44
  health       8   26
  other       35  106
  services    26   77
  teacher     14   44
$Fold01
[1] 0.25

$Fold02
[1] 0.45

$Fold03
[1] 0.3

$Fold04
[1] 0.35

$Fold05
[1] 0.4

$Fold06
[1] 0.15

$Fold07
[1] 0.35

$Fold08
[1] 0.1904762

$Fold09
[1] 0.3

$Fold10
[1] 0.35

$Fold11
[1] 0.4210526

$Fold12
[1] 0.2105263

$Fold13
[1] 0.15

$Fold14
[1] 0.2

$Fold15
[1] 0.5555556

$Fold16
[1] 0.3157895

$Fold17
[1] 0.45

$Fold18
[1] 0.15

$Fold19
[1] 0.25

$Fold20
[1] 0.4210526
 [1] 0.2500000 0.4500000 0.3000000 0.3500000 0.4000000 0.1500000 0.3500000
 [8] 0.1904762 0.3000000 0.3500000 0.4210526 0.2105263 0.1500000 0.2000000
[15] 0.5555556 0.3157895 0.4500000 0.1500000 0.2500000 0.4210526
[1] 0.3107226

->Training_data 75%、Test_data 25%
->跑20次,最佳0.4、最差0.1,平均0.27,可得知準確率偏低。

父工作類別:
             Df Sum Sq Mean Sq F value Pr(>F)
Fjob          4    105   26.27   1.941  0.103
Residuals   390   5279   13.54               
->父親工作類別對於學生成績表現有差異。
事後檢定:

->從事教師職業和從事其他職業類型差異最大。

畫圖:

[[1]]
[[1]]$stats
     [,1] [,2] [,3] [,4] [,5]
[1,]  5.0    6    4    3    5
[2,]  9.0   10    8    8   10
[3,] 11.5   11   10   11   14
[4,] 14.5   14   13   13   16
[5,] 18.0   17   19   19   19

[[1]]$n
[1]  20  18 217 111  29

[[1]]$conf
          [,1]      [,2]      [,3]     [,4]     [,5]
[1,]  9.556857  9.510362  9.463713 10.25017 12.23961
[2,] 13.443143 12.489638 10.536287 11.74983 15.76039

[[1]]$out
numeric(0)

[[1]]$group
numeric(0)

[[1]]$names
[1] "at_home"  "health"   "other"    "services" "teacher" 


[[2]]
[[2]]$stats
     [,1] [,2] [,3] [,4] [,5]
[1,]    5  6.0    4    5    0
[2,]    9  9.0    8    9    9
[3,]   11 11.5   10   11   13
[4,]   14 14.0   13   13   16
[5,]   18 17.0   19   19   19

[[2]]$n
[1]  20  18 217 111  29

[[2]]$conf
          [,1]      [,2]      [,3]     [,4]     [,5]
[1,]  9.233506  9.637952  9.463713 10.40013 10.94621
[2,] 12.766494 13.362048 10.536287 11.59987 15.05379

[[2]]$out
 [1] 0 0 0 0 0 0 0 0 0 0 0

[[2]]$group
 [1] 1 1 3 3 3 3 3 3 3 3 4

[[2]]$names
[1] "at_home"  "health"   "other"    "services" "teacher" 


[[3]]
[[3]]$stats
     [,1] [,2] [,3] [,4] [,5]
[1,]  6.0    7    4    5    6
[2,]  8.5    9    8    9   10
[3,] 11.0   11   11   11   14
[4,] 13.5   14   13   13   16
[5,] 19.0   18   19   18   19

[[3]]$n
[1]  20  18 217 111  29

[[3]]$conf
          [,1]      [,2]     [,3]     [,4]     [,5]
[1,]  9.233506  9.137952 10.46371 10.40013 12.23961
[2,] 12.766494 12.862048 11.53629 11.59987 15.76039

[[3]]$out
 [1]  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0 20
[26]  0  0  0  0  0  0  0  0  0  0  0  0  0  0

[[3]]$group
 [1] 1 1 1 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 4 4 4 4 4 4 4 4 4 4 4 4 5 5
[39] 5

[[3]]$names
[1] "at_home"  "health"   "other"    "services" "teacher" 


[[4]]
[[4]]$stats
          [,1]      [,2]      [,3]      [,4]      [,5]
[1,]  1.666667  6.666667  1.333333  2.333333  3.000000
[2,]  8.333333  9.000000  8.000000  8.666667  9.666667
[3,] 11.333333 11.166667 10.333333 10.333333 12.666667
[4,] 14.000000 14.000000 13.000000 13.000000 15.666667
[5,] 18.333333 17.333333 18.666667 19.333333 18.666667

[[4]]$n
[1]  20  18 217 111  29

[[4]]$conf
          [,1]      [,2]      [,3]      [,4]     [,5]
[1,]  9.331307  9.304619  9.797046  9.683476 10.90627
[2,] 13.335360 13.028715 10.869620 10.983190 14.42706

[[4]]$out
numeric(0)

[[4]]$group
numeric(0)

[[4]]$names
[1] "at_home"  "health"   "other"    "services" "teacher" 


->父親職業為教師相關對於學生成績表現最好。
->父親從事其他類型的工作對於學生成績表現最差。

knn分類預測:
          TrainFlag
           FALSE TRUE
  at_home      5   15
  health       4   14
  other       54  163
  services    28   83
  teacher      7   22
$Fold01
[1] 0.3157895

$Fold02
[1] 0.55

$Fold03
[1] 0.3684211

$Fold04
[1] 0.5263158

$Fold05
[1] 0.4736842

$Fold06
[1] 0.45

$Fold07
[1] 0.3

$Fold08
[1] 0.4736842

$Fold09
[1] 0.5

$Fold10
[1] 0.5263158

$Fold11
[1] 0.55

$Fold12
[1] 0.6

$Fold13
[1] 0.5238095

$Fold14
[1] 0.4

$Fold15
[1] 0.2105263

$Fold16
[1] 0.5714286

$Fold17
[1] 0.45

$Fold18
[1] 0.4285714

$Fold19
[1] 0.2105263

$Fold20
[1] 0.4
 [1] 0.3157895 0.5500000 0.3684211 0.5263158 0.4736842 0.4500000 0.3000000
 [8] 0.4736842 0.5000000 0.5263158 0.5500000 0.6000000 0.5238095 0.4000000
[15] 0.2105263 0.5714286 0.4500000 0.4285714 0.2105263 0.4000000
[1] 0.4414536

->Training_data 75%、Test_data 25%
->跑20次,最佳0.65、最差0.17,平均0.43,準確度較前者高。

小結
1. 從二元類型欄位檢定可以看出性別&居住地對學生數學成績表現皆有影響:
->男生高於女生、居住於城市高於鄉村
2. 從類別型欄位檢定可以看出母親職業別為「健康護理相關」、父親為「教師」,對於學生成績表現最佳。
3. 父親的職業別比起母親更容易影響學生成績。
4. 接著畫出關聯性規則分析影響成績表現之組合。

關聯性規則分析

觀察資料分佈&轉換資料grade->level:

  [1] "level2" "level2" "level2" "level3" "level2" "level3" "level3" "level2"
  [9] "level4" "level3" "level2" "level3" "level3" "level3" "level4" "level3"
 [17] "level3" "level2" "level2" "level2" "level3" "level3" "level4" "level3"
 [25] "level2" "level2" "level3" "level4" "level3" "level3" "level3" "level4"
 [33] "level4" "level2" "level3" "level2" "level4" "level4" "level3" "level3"
 [41] "level2" "level3" "level4" "level2" "level2" "level2" "level3" "level4"
 [49] "level3" "level2" "level3" "level3" "level3" "level2" "level3" "level2"
 [57] "level3" "level3" "level2" "level4" "level3" "level2" "level2" "level2"
 [65] "level2" "level4" "level3" "level2" "level2" "level4" "level3" "level2"
 [73] "level2" "level3" "level3" "level2" "level3" "level3" "level2" "level1"
 [81] "level3" "level3" "level2" "level3" "level2" "level2" "level2" "level3"
 [89] "level3" "level2" "level2" "level4" "level2" "level3" "level3" "level2"
 [97] "level3" "level2" "level3" "level2" "level2" "level4" "level3" "level2"
[105] "level4" "level3" "level2" "level4" "level3" "level3" "level4" "level2"
[113] "level3" "level4" "level2" "level4" "level3" "level3" "level2" "level3"
[121] "level4" "level3" "level3" "level3" "level2" "level3" "level2" "level2"
[129] "level1" "level4" "level1" "level1" "level3" "level3" "level1" "level1"
[137] "level1" "level1" "level3" "level4" "level2" "level2" "level3" "level3"
[145] "level1" "level2" "level1" "level3" "level1" "level2" "level1" "level3"
[153] "level2" "level1" "level3" "level2" "level3" "level2" "level4" "level3"
[161] "level1" "level2" "level1" "level2" "level2" "level3" "level2" "level3"
[169] "level1" "level3" "level1" "level3" "level3" "level1" "level2" "level2"
[177] "level3" "level2" "level2" "level3" "level2" "level3" "level4" "level2"
[185] "level3" "level3" "level3" "level3" "level2" "level2" "level3" "level2"
[193] "level2" "level2" "level3" "level3" "level4" "level2" "level4" "level2"
[201] "level4" "level2" "level2" "level2" "level3" "level2" "level2" "level3"
[209] "level2" "level2" "level2" "level3" "level3" "level2" "level2" "level3"
[217] "level2" "level2" "level2" "level2" "level2" "level1" "level4" "level3"
[225] "level3" "level2" "level4" "level3" "level2" "level3" "level3" "level3"
[233] "level2" "level3" "level2" "level2" "level3" "level3" "level3" "level1"
[241] "level3" "level3" "level1" "level3" "level1" "level4" "level3" "level2"
[249] "level1" "level3" "level2" "level2" "level2" "level2" "level3" "level2"
[257] "level3" "level3" "level3" "level2" "level4" "level2" "level3" "level2"
[265] "level2" "level4" "level2" "level3" "level2" "level1" "level2" "level3"
[273] "level3" "level3" "level2" "level3" "level2" "level2" "level2" "level3"
[281] "level2" "level2" "level3" "level2" "level2" "level3" "level4" "level3"
[289] "level3" "level3" "level3" "level3" "level3" "level4" "level3" "level3"
[297] "level2" "level2" "level3" "level4" "level3" "level3" "level3" "level4"
[305] "level3" "level3" "level4" "level2" "level3" "level3" "level2" "level3"
[313] "level3" "level3" "level3" "level3" "level2" "level2" "level3" "level3"
[321] "level3" "level2" "level3" "level3" "level4" "level3" "level3" "level3"
[329] "level2" "level3" "level2" "level3" "level1" "level2" "level2" "level4"
[337] "level3" "level1" "level4" "level2" "level3" "level2" "level4" "level2"
[345] "level3" "level3" "level4" "level2" "level3" "level3" "level2" "level3"
[353] "level2" "level2" "level3" "level2" "level3" "level3" "level2" "level4"
[361] "level3" "level3" "level3" "level4" "level3" "level2" "level3" "level1"
[369] "level3" "level3" "level2" "level3" "level3" "level2" "level4" "level2"
[377] "level3" "level2" "level3" "level2" "level3" "level2" "level3" "level1"
[385] "level2" "level2" "level2" "level1" "level2" "level1" "level2" "level4"
[393] "level2" "level3" "level2"
->成績平均值分佈範圍為0~20,主要集中在6~14區間內,因為要區分成績表現,故level取:
0~5 (level1)
6~10 (level2)
11~15 (level3)
16~20 (level4)

觀察資料分佈&轉換資料age->level:

  [1] "18~19y" "15~17y" "15~17y" "15~17y" "15~17y" "15~17y" "15~17y" "15~17y"
  [9] "15~17y" "15~17y" "15~17y" "15~17y" "15~17y" "15~17y" "15~17y" "15~17y"
 [17] "15~17y" "15~17y" "15~17y" "15~17y" "15~17y" "15~17y" "15~17y" "15~17y"
 [25] "15~17y" "15~17y" "15~17y" "15~17y" "15~17y" "15~17y" "15~17y" "15~17y"
 [33] "15~17y" "15~17y" "15~17y" "15~17y" "15~17y" "15~17y" "15~17y" "15~17y"
 [41] "15~17y" "15~17y" "15~17y" "15~17y" "15~17y" "15~17y" "15~17y" "15~17y"
 [49] "15~17y" "15~17y" "15~17y" "15~17y" "15~17y" "15~17y" "15~17y" "15~17y"
 [57] "15~17y" "15~17y" "15~17y" "15~17y" "15~17y" "15~17y" "15~17y" "15~17y"
 [65] "15~17y" "15~17y" "15~17y" "15~17y" "15~17y" "15~17y" "15~17y" "15~17y"
 [73] "15~17y" "15~17y" "15~17y" "15~17y" "15~17y" "15~17y" "15~17y" "15~17y"
 [81] "15~17y" "15~17y" "15~17y" "15~17y" "15~17y" "15~17y" "15~17y" "15~17y"
 [89] "15~17y" "15~17y" "15~17y" "15~17y" "15~17y" "15~17y" "15~17y" "15~17y"
 [97] "15~17y" "15~17y" "15~17y" "15~17y" "15~17y" "15~17y" "15~17y" "15~17y"
[105] "15~17y" "15~17y" "15~17y" "15~17y" "15~17y" "15~17y" "15~17y" "15~17y"
[113] "15~17y" "15~17y" "15~17y" "15~17y" "15~17y" "15~17y" "15~17y" "15~17y"
[121] "15~17y" "15~17y" "15~17y" "15~17y" "15~17y" "15~17y" "15~17y" "18~19y"
[129] "18~19y" "15~17y" "15~17y" "15~17y" "15~17y" "15~17y" "15~17y" "15~17y"
[137] "15~17y" "15~17y" "15~17y" "15~17y" "15~17y" "15~17y" "15~17y" "15~17y"
[145] "15~17y" "15~17y" "15~17y" "15~17y" "15~17y" "15~17y" "18~19y" "15~17y"
[153] "15~17y" "18~19y" "15~17y" "15~17y" "15~17y" "18~19y" "15~17y" "15~17y"
[161] "15~17y" "15~17y" "15~17y" "15~17y" "15~17y" "15~17y" "15~17y" "15~17y"
[169] "15~17y" "15~17y" "15~17y" "15~17y" "15~17y" "15~17y" "15~17y" "15~17y"
[177] "15~17y" "15~17y" "15~17y" "15~17y" "15~17y" "15~17y" "15~17y" "15~17y"
[185] "15~17y" "15~17y" "15~17y" "15~17y" "15~17y" "15~17y" "15~17y" "15~17y"
[193] "15~17y" "15~17y" "15~17y" "15~17y" "15~17y" "15~17y" "15~17y" "15~17y"
[201] "15~17y" "15~17y" "15~17y" "15~17y" "15~17y" "15~17y" "15~17y" "15~17y"
[209] "15~17y" "15~17y" "18~19y" "15~17y" "15~17y" "18~19y" "15~17y" "15~17y"
[217] "15~17y" "18~19y" "15~17y" "15~17y" "15~17y" "15~17y" "15~17y" "18~19y"
[225] "15~17y" "18~19y" "15~17y" "15~17y" "18~19y" "15~17y" "15~17y" "15~17y"
[233] "15~17y" "15~17y" "15~17y" "15~17y" "15~17y" "15~17y" "15~17y" "18~19y"
[241] "15~17y" "15~17y" "15~17y" "15~17y" "18~19y" "15~17y" "15~17y" "20~22y"
[249] "18~19y" "15~17y" "18~19y" "15~17y" "18~19y" "15~17y" "15~17y" "15~17y"
[257] "15~17y" "18~19y" "18~19y" "15~17y" "18~19y" "18~19y" "18~19y" "15~17y"
[265] "18~19y" "18~19y" "15~17y" "18~19y" "18~19y" "18~19y" "18~19y" "18~19y"
[273] "18~19y" "15~17y" "15~17y" "15~17y" "18~19y" "18~19y" "18~19y" "18~19y"
[281] "15~17y" "15~17y" "18~19y" "18~19y" "15~17y" "15~17y" "18~19y" "15~17y"
[289] "18~19y" "18~19y" "18~19y" "15~17y" "18~19y" "15~17y" "18~19y" "15~17y"
[297] "18~19y" "18~19y" "18~19y" "18~19y" "18~19y" "15~17y" "15~17y" "15~17y"
[305] "18~19y" "18~19y" "20~22y" "18~19y" "18~19y" "18~19y" "18~19y" "18~19y"
[313] "18~19y" "18~19y" "18~19y" "18~19y" "18~19y" "18~19y" "15~17y" "18~19y"
[321] "15~17y" "15~17y" "15~17y" "15~17y" "15~17y" "18~19y" "15~17y" "15~17y"
[329] "15~17y" "15~17y" "18~19y" "15~17y" "18~19y" "18~19y" "18~19y" "15~17y"
[337] "18~19y" "15~17y" "18~19y" "15~17y" "18~19y" "18~19y" "18~19y" "15~17y"
[345] "18~19y" "18~19y" "18~19y" "18~19y" "15~17y" "18~19y" "18~19y" "15~17y"
[353] "18~19y" "18~19y" "15~17y" "18~19y" "15~17y" "15~17y" "18~19y" "18~19y"
[361] "18~19y" "18~19y" "18~19y" "15~17y" "15~17y" "18~19y" "18~19y" "15~17y"
[369] "18~19y" "18~19y" "18~19y" "18~19y" "15~17y" "15~17y" "18~19y" "18~19y"
[377] "20~22y" "18~19y" "18~19y" "15~17y" "18~19y" "18~19y" "15~17y" "18~19y"
[385] "18~19y" "18~19y" "18~19y" "18~19y" "18~19y" "18~19y" "20~22y" "15~17y"
[393] "20~22y" "18~19y" "18~19y"
->年齡分佈範圍在15~22,且多集中在1\6~18區間,因為想區分中學和大學生(大~大二、大三~大四),故level取:
15~17歲
18~19歲
20~22歲

觀察資料分佈&轉換資料absences->level:

[1] 395

->缺席堂數雖區間為0~95,但資料集內最大值為75堂,且從中位數和Q1~Q3來看,資料多集中在10以下,故level取:
缺席0~3堂
缺席4~6堂
缺席7~10堂
缺席11堂以上

整理資料:
'data.frame':   395 obs. of  31 variables:
 $ school    : Factor w/ 2 levels "GP","MS": 1 1 1 1 1 1 1 1 1 1 ...
 $ sex       : Factor w/ 2 levels "F","M": 1 1 1 1 1 2 2 1 2 2 ...
 $ address   : Factor w/ 2 levels "R","U": 2 2 2 2 2 2 2 2 2 2 ...
 $ famsize   : Factor w/ 2 levels "GT3","LE3": 1 1 2 1 1 2 2 1 2 1 ...
 $ Pstatus   : Factor w/ 2 levels "A","T": 1 2 2 2 2 2 2 1 1 2 ...
 $ Medu      : Factor w/ 5 levels "0","1","2","3",..: 5 2 2 5 4 5 3 5 4 4 ...
 $ Fedu      : Factor w/ 5 levels "0","1","2","3",..: 5 2 2 3 4 4 3 5 3 5 ...
 $ Mjob      : Factor w/ 5 levels "at_home","health",..: 1 1 1 2 3 4 3 3 4 3 ...
 $ Fjob      : Factor w/ 5 levels "at_home","health",..: 5 3 3 4 3 3 3 5 3 3 ...
 $ reason    : Factor w/ 4 levels "course","home",..: 1 1 3 2 2 4 2 2 2 2 ...
 $ guardian  : Factor w/ 3 levels "father","mother",..: 2 1 2 2 1 2 2 2 2 2 ...
 $ traveltime: Factor w/ 4 levels "1","2","3","4": 2 1 1 1 1 1 1 2 1 1 ...
 $ studytime : Factor w/ 4 levels "1","2","3","4": 2 2 2 3 2 2 2 2 2 2 ...
 $ failures  : Factor w/ 4 levels "0","1","2","3": 1 1 4 1 1 1 1 1 1 1 ...
 $ schoolsup : Factor w/ 2 levels "no","yes": 2 1 2 1 1 1 1 2 1 1 ...
 $ famsup    : Factor w/ 2 levels "no","yes": 1 2 1 2 2 2 1 2 2 2 ...
 $ paid      : Factor w/ 2 levels "no","yes": 1 1 2 2 2 2 1 1 2 2 ...
 $ activities: Factor w/ 2 levels "no","yes": 1 1 1 2 1 2 1 1 1 2 ...
 $ nursery   : Factor w/ 2 levels "no","yes": 2 1 2 2 2 2 2 2 2 2 ...
 $ higher    : Factor w/ 2 levels "no","yes": 2 2 2 2 2 2 2 2 2 2 ...
 $ internet  : Factor w/ 2 levels "no","yes": 1 2 2 2 1 2 2 1 2 2 ...
 $ romantic  : Factor w/ 2 levels "no","yes": 1 1 1 2 1 1 1 1 1 1 ...
 $ famrel    : Factor w/ 5 levels "1","2","3","4",..: 4 5 4 3 4 5 4 4 4 5 ...
 $ freetime  : Factor w/ 5 levels "1","2","3","4",..: 3 3 3 2 3 4 4 1 2 5 ...
 $ goout     : Factor w/ 5 levels "1","2","3","4",..: 4 3 2 2 2 2 4 4 2 1 ...
 $ Dalc      : Factor w/ 5 levels "1","2","3","4",..: 1 1 2 1 1 1 1 1 1 1 ...
 $ Walc      : Factor w/ 5 levels "1","2","3","4",..: 1 1 3 1 2 2 1 1 1 1 ...
 $ health    : Factor w/ 5 levels "1","2","3","4",..: 3 3 3 5 5 5 3 1 1 5 ...
 $ glevel    : Factor w/ 4 levels "level1","level2",..: 2 2 2 3 2 3 3 2 4 3 ...
 $ agelevel  : Factor w/ 3 levels "15~17y","18~19y",..: 2 1 1 1 1 1 1 1 1 1 ...
 $ abslevel  : Factor w/ 4 levels "0~3","11up","4~6",..: 3 3 4 1 3 4 1 3 1 1 ...
將所有變數轉成Factor型態,
並去掉原先沒轉成區間的數值型欄位,加入轉換過的欄位,以便做關聯性規則分析。

產生規則並畫圖:
Apriori

Parameter specification:
 confidence minval smax arem  aval originalSupport maxtime support minlen
        0.5    0.1    1 none FALSE            TRUE       5     0.3      2
 maxlen target  ext
     10  rules TRUE

Algorithmic control:
 filter tree heap memopt load sort verbose
    0.1 TRUE TRUE  FALSE TRUE    2    TRUE

Absolute minimum support count: 118 

set item appearances ...[4 item(s)] done [0.00s].
set transactions ...[106 item(s), 395 transaction(s)] done [0.00s].
sorting and recoding items ... [36 item(s)] done [0.00s].
creating transaction tree ... done [0.00s].
checking subsets of size 1 2 3 4 5 6 7 8 done [0.02s].
writing ... [8 rule(s)] done [0.00s].
creating S4 object  ... done [0.00s].
    lhs                rhs               support confidence  coverage     lift count
[1] {failures=0,                                                                    
     schoolsup=no,                                                                  
     higher=yes,                                                                    
     internet=yes}  => {glevel=level3} 0.3012658  0.5242291 0.5746835 1.225269   119
[2] {failures=0,                                                                    
     schoolsup=no,                                                                  
     internet=yes}  => {glevel=level3} 0.3063291  0.5170940 0.5924051 1.208593   121
[3] {school=GP,                                                                     
     failures=0,                                                                    
     schoolsup=no,                                                                  
     higher=yes}    => {glevel=level3} 0.3012658  0.5085470 0.5924051 1.188616   119
[4] {failures=0,                                                                    
     schoolsup=no,                                                                  
     higher=yes}    => {glevel=level3} 0.3392405  0.5056604 0.6708861 1.181869   134
[5] {Pstatus=T,                                                                     
     failures=0,                                                                    
     schoolsup=no,                                                                  
     higher=yes}    => {glevel=level3} 0.3063291  0.5041667 0.6075949 1.178378   121
[6] {failures=0,                                                                    
     schoolsup=no}  => {glevel=level3} 0.3443038  0.5000000 0.6886076 1.168639   136
[7] {school=GP,                                                                     
     failures=0,                                                                    
     schoolsup=no}  => {glevel=level3} 0.3037975  0.5000000 0.6075949 1.168639   120
[8] {Pstatus=T,                                                                     
     failures=0,                                                                    
     schoolsup=no}  => {glevel=level3} 0.3113924  0.5000000 0.6227848 1.168639   123

因support值取到0.2以下會有大於1000筆的規則,
故取0.3,跑出8筆規則,lift值在1.15以上。
且可觀察到規則大多指向level3(成績分佈在11~15分)。

從規則圖可以看出「過去課程失敗數量為0、沒有額外的教育支持、想接受高等教育和家中有網路」會使得學生成績表現較高。
(因資料集成績多分佈在6~14分內,level3為11~15故得出此論點)

產生glevel2的規則並畫圖:
Apriori

Parameter specification:
 confidence minval smax arem  aval originalSupport maxtime support minlen
        0.5    0.1    1 none FALSE            TRUE       5     0.1      2
 maxlen target  ext
     10  rules TRUE

Algorithmic control:
 filter tree heap memopt load sort verbose
    0.1 TRUE TRUE  FALSE TRUE    2    TRUE

Absolute minimum support count: 39 

set item appearances ...[1 item(s)] done [0.00s].
set transactions ...[106 item(s), 395 transaction(s)] done [0.00s].
sorting and recoding items ... [82 item(s)] done [0.00s].
creating transaction tree ... done [0.00s].
checking subsets of size 1 2 3 4 5 6 7 8 9 10 done [0.52s].
writing ... [32 rule(s)] done [0.01s].
creating S4 object  ... done [0.01s].
     lhs                  rhs               support confidence  coverage     lift count
[1]  {school=GP,                                                                       
      sex=F,                                                                           
      studytime=2,                                                                     
      agelevel=15~17y} => {glevel=level2} 0.1012658  0.5333333 0.1898734 1.413870    40
[2]  {school=GP,                                                                       
      sex=F,                                                                           
      studytime=2,                                                                     
      higher=yes,                                                                      
      agelevel=15~17y} => {glevel=level2} 0.1012658  0.5333333 0.1898734 1.413870    40
[3]  {sex=F,                                                                           
      traveltime=1,                                                                    
      studytime=2}     => {glevel=level2} 0.1012658  0.5263158 0.1924051 1.395267    40
[4]  {sex=F,                                                                           
      Fjob=other,                                                                      
      famsup=yes}      => {glevel=level2} 0.1012658  0.5263158 0.1924051 1.395267    40
[5]  {school=GP,                                                                       
      sex=F,                                                                           
      studytime=2,                                                                     
      higher=yes}      => {glevel=level2} 0.1265823  0.5263158 0.2405063 1.395267    50
[6]  {school=GP,                                                                       
      sex=F,                                                                           
      studytime=2,                                                                     
      nursery=yes,                                                                     
      higher=yes}      => {glevel=level2} 0.1012658  0.5263158 0.1924051 1.395267    40
[7]  {school=GP,                                                                       
      sex=F,                                                                           
      studytime=2}     => {glevel=level2} 0.1316456  0.5252525 0.2506329 1.392448    52
[8]  {school=GP,                                                                       
      sex=F,                                                                           
      address=U,                                                                       
      studytime=2,                                                                     
      higher=yes}      => {glevel=level2} 0.1139241  0.5172414 0.2202532 1.371210    45
[9]  {school=GP,                                                                       
      sex=F,                                                                           
      address=U,                                                                       
      studytime=2}     => {glevel=level2} 0.1164557  0.5168539 0.2253165 1.370183    46
[10] {sex=F,                                                                           
      Fjob=other,                                                                      
      agelevel=15~17y} => {glevel=level2} 0.1012658  0.5128205 0.1974684 1.359491    40
[11] {school=GP,                                                                       
      sex=F,                                                                           
      studytime=2,                                                                     
      nursery=yes}     => {glevel=level2} 0.1012658  0.5128205 0.1974684 1.359491    40
[12] {sex=F,                                                                           
      Fjob=other,                                                                      
      higher=yes,                                                                      
      agelevel=15~17y} => {glevel=level2} 0.1012658  0.5128205 0.1974684 1.359491    40
[13] {sex=F,                                                                           
      studytime=2,                                                                     
      agelevel=15~17y} => {glevel=level2} 0.1037975  0.5125000 0.2025316 1.358641    41
[14] {sex=F,                                                                           
      studytime=2,                                                                     
      higher=yes,                                                                      
      agelevel=15~17y} => {glevel=level2} 0.1037975  0.5125000 0.2025316 1.358641    41
[15] {Fedu=1}          => {glevel=level2} 0.1063291  0.5121951 0.2075949 1.357833    42
[16] {school=GP,                                                                       
      sex=F,                                                                           
      paid=yes,                                                                        
      nursery=yes}     => {glevel=level2} 0.1088608  0.5119048 0.2126582 1.357063    43
[17] {sex=F,                                                                           
      Pstatus=T,                                                                       
      paid=yes,                                                                        
      nursery=yes}     => {glevel=level2} 0.1088608  0.5119048 0.2126582 1.357063    43
[18] {school=GP,                                                                       
      sex=F,                                                                           
      paid=yes,                                                                        
      nursery=yes,                                                                     
      higher=yes}      => {glevel=level2} 0.1088608  0.5119048 0.2126582 1.357063    43
[19] {sex=F,                                                                           
      Pstatus=T,                                                                       
      paid=yes,                                                                        
      nursery=yes,                                                                     
      higher=yes}      => {glevel=level2} 0.1088608  0.5119048 0.2126582 1.357063    43
[20] {school=GP,                                                                       
      sex=F,                                                                           
      Pstatus=T,                                                                       
      studytime=2,                                                                     
      higher=yes}      => {glevel=level2} 0.1037975  0.5061728 0.2050633 1.341868    41
[21] {sex=F,                                                                           
      studytime=2,                                                                     
      nursery=yes,                                                                     
      higher=yes}      => {glevel=level2} 0.1113924  0.5057471 0.2202532 1.340739    44
[22] {sex=F,                                                                           
      paid=yes,                                                                        
      nursery=yes}     => {glevel=level2} 0.1189873  0.5053763 0.2354430 1.339756    47
[23] {sex=F,                                                                           
      paid=yes,                                                                        
      nursery=yes,                                                                     
      higher=yes}      => {glevel=level2} 0.1189873  0.5053763 0.2354430 1.339756    47
[24] {sex=F,                                                                           
      address=U,                                                                       
      studytime=2,                                                                     
      higher=yes}      => {glevel=level2} 0.1215190  0.5052632 0.2405063 1.339456    48
[25] {sex=F,                                                                           
      address=U,                                                                       
      studytime=2}     => {glevel=level2} 0.1240506  0.5051546 0.2455696 1.339168    49
[26] {sex=F,                                                                           
      studytime=2,                                                                     
      higher=yes}      => {glevel=level2} 0.1392405  0.5045872 0.2759494 1.337664    55
[27] {sex=F,                                                                           
      studytime=2}     => {glevel=level2} 0.1443038  0.5044248 0.2860759 1.337233    57
[28] {higher=yes,                                                                      
      goout=4}         => {glevel=level2} 0.1063291  0.5000000 0.2126582 1.325503    42
[29] {studytime=2,                                                                     
      freetime=3}      => {glevel=level2} 0.1037975  0.5000000 0.2075949 1.325503    41
[30] {Fjob=other,                                                                      
      higher=yes,                                                                      
      freetime=3}      => {glevel=level2} 0.1037975  0.5000000 0.2075949 1.325503    41
[31] {school=GP,                                                                       
      sex=F,                                                                           
      Pstatus=T,                                                                       
      studytime=2}     => {glevel=level2} 0.1063291  0.5000000 0.2126582 1.325503    42
[32] {sex=F,                                                                           
      famsize=GT3,                                                                     
      Fjob=other,                                                                      
      higher=yes}      => {glevel=level2} 0.1012658  0.5000000 0.2025316 1.325503    40
support值取到0.1才有規則,跑出32筆,lift值在1.3以上。
從規則圖可以看出「性別為女生、年齡為15~17y、一週自主讀書時間較少(2~5小時)和就讀GP學校」會使得學生成績表現較低。
(因資料集成績多分佈在6~14分,level2為6~10分故得出此論點)。

產生glevel1的規則:
Apriori

Parameter specification:
 confidence minval smax arem  aval originalSupport maxtime support minlen
        0.5    0.1    1 none FALSE            TRUE       5     0.1      2
 maxlen target  ext
     10  rules TRUE

Algorithmic control:
 filter tree heap memopt load sort verbose
    0.1 TRUE TRUE  FALSE TRUE    2    TRUE

Absolute minimum support count: 39 

set item appearances ...[1 item(s)] done [0.00s].
set transactions ...[106 item(s), 395 transaction(s)] done [0.00s].
sorting and recoding items ... [82 item(s)] done [0.00s].
creating transaction tree ... done [0.00s].
checking subsets of size 1 2 3 4 5 6 7 8 9 10 done [0.52s].
writing ... [0 rule(s)] done [0.01s].
creating S4 object  ... done [0.01s].
set of 0 rules 
規則數為0。
產生glevel4的規則:
Apriori

Parameter specification:
 confidence minval smax arem  aval originalSupport maxtime support minlen
        0.5    0.1    1 none FALSE            TRUE       5     0.1      2
 maxlen target  ext
     10  rules TRUE

Algorithmic control:
 filter tree heap memopt load sort verbose
    0.1 TRUE TRUE  FALSE TRUE    2    TRUE

Absolute minimum support count: 39 

set item appearances ...[1 item(s)] done [0.00s].
set transactions ...[106 item(s), 395 transaction(s)] done [0.00s].
sorting and recoding items ... [82 item(s)] done [0.00s].
creating transaction tree ... done [0.00s].
checking subsets of size 1 2 3 4 5 6 7 8 9 10 done [0.50s].
writing ... [0 rule(s)] done [0.01s].
creating S4 object  ... done [0.01s].
set of 0 rules 

規則數為0。
小結
1. 因成績在level1和level4的資料較少,且support值取到0.1仍找不出規則,故以成績分佈在level2~3為主。
2. 從前述規則可以看出「過去課程失敗數量為0、沒有額外的教育支持、想接受高等教育和家中有網路」會是成績表現較高之因素。
3. 而「性別為女生、中學生(年齡為15~17y)、一週自主讀書時間較少(2~5小時)和就讀GP學校」會使得學生成績表現較低。

結論

結論
1. 性別影響成績:女性在數學成績表現上較低。
2. 家庭背景影響成績:家庭背景(父母親教育程度&職業)是影響學生數學成績表現的重要因素。
3. 家長職業:家中有教師職業的父親可為學生成績帶來提升,推測可能為管教嚴且會教導小孩學習。
4. 學生本身意願:學生本身想持續升學與家中有網路會提高學生成績,推測是因為學生自學興趣高且有網路資源可學習。
5. 課程難度&年紀:自主學習時間少且年紀較低會使成績表現差,推測可能原因為教學內容對於低年級學生來說難度較高。